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Abstract —The energy demands of Ethernet links have heen 
an active focus of research in the recent years. This work has 
enabled a new generation of Energy Efficient Ethernet (EEE) 
interfaces able to adapt their power consumption to the actual 
traffic demands, thus yielding significant energy savings. With 
the energy consumption of singie network connections being a 
soived problem, in this paper we focus on the energy demands of 
link aggregates that are commonly used to Increase the capacity 
of a network connection. We build on known energy models 
of single EEE links to derive the energy demands of the whole 
aggregate as a function on how the traffic load is spread among its 
powered links. We then provide a practical method to share the 
load that minimizes overall energy consumption with controlled 
packet delay, and prove that it is valid for a wide range of EEE 
links. Elnally, we validate our method with both synthetic and 
real traffic traces captured in Internet backbones. 

Index Terms —Network Interfaces, Link aggregation. Optimiza¬ 
tion methods. Energy efficiency 


1. Introduction 

NERGY CONSUMPTION is nowadays a global source 
of concern for both economic and environmental reasons. 
Networking equipment alone consumes 1.8% of the world’s 
electricity, and that number is currently increasing at a 10% 
rate annually [1]. If we just focus on data centers, between 
15 and 20% of electricity is used for networking [2]. These 
reasons are spurring the development of more power efficient 
networking equipment. 

A direct result of these efforts is the IEEE 802.3az stan¬ 
dard [3] which provides a new idle mode for Ethernet physical 
interfaces. This new mode only needs a small fraction of 
the power used in normal operation, but no traffic can be 
transmitted nor received while the interface stays in the idle 
mode. Since there is an implicit trade-off between energy 
consumption and frame delay, these new Energy Efficient 
Ethernet (EEE) interfaces need a governor that decides when 
to enter and exit this idle mode. In fact, several alternatives 
have already been proposed in the literature [4], [5], [6], [7] 
and have been later validated by both empirical [8], [9] and 
analytic means [10], [11], [12], [13], [14]. These works have 
provided us with the tools needed to accurately estimate the 
power savings of EEE for any arrival traffic pattern with the 
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more prevalent idle mode governors and to properly tune them 
to maximize energy savings. 

With the energy consumption problem of single Ethernet 
links mostly solved we focus in this paper on the power 
demands of network connections formed by multiple EEE 
links, either by link aggregation [15] or some other proprietary 
means. Despite the existence of EEE for saving energy in the 
individual components of the bundle, the global consumption 
of an aggregate may be severely affected by how the incoming 
traffic is shared among its powered up links. In fact, the power 
profiles of the individual EEE links are not linear, as their 
energy demands do not grow proportionally to the offered 
load. This makes the overall power consumption dependent 
on the actual traffic share among the links of the aggregate. 

The main goal of this paper is to obtain the optimum 
share of traffic among the links of an aggregate from an 
energy efficiency perspective. As far as we know, this is 
the first paper to tackle this issue. We propose a water- 
filling algorithm, where traffic is only transmitted on a given 
link if all the previous ones are already being used at their 
maximum capacity and show that it is optimum for various 
relevant traffic arrival patterns. Additionally, we also propose 
a practical implementation of the algorithm that can be applied 
with minimal computational needs in the firmware of Ethernet 
line cards. 

The rest of this paper is organized as follows. We introduce 
some work related to Energy Efficient Ethernet in Section II. 
Section III provides a formal description of the problem at 
hand. Section IV analyzes the concavity of the cost function 
of the main EEE algorithms. Section V details a practical algo¬ 
rithm to implement water-filling. The results are commented 
in Section VI. Einally, Section VII ends the paper with our 
conclusions. 

II. Related Work 

There are several areas where energy can be saved in the 
current Internet that were first identified in [16]. The existence 
of spare installed capacity was one of the identified aspects. 
Several works proposed to power off unused links during 
low load periods concentrating traffic on just a few network 
paths [17], [18], [19], [20], [21]. Of all these proposals, [19], 
[20] also take into consideration aggregated links between 
two network devices. However, all these works focus on long 
timescales, usually hours, while we are interested in much 


lower timescales, as such, both approaches can be seen as 
complementary. Links (and network paths) can be powered off 
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when the long-term traffic load is low enough, while, for the 
short timescales, another approach should be used to reduce 
the energy usage of those links in the aggregate that remain 
active. 

Another source of inefficiency identified in [16] was the 
physical interfaces of network devices. At that time, physical 
interfaces drew a constant amount of power, regardless of the 
actual traffic load. Preliminary works tried to mitigate this 
either by adapting the transmission speed [22], with lower 
speeds demanding less power, or by briefly switching off the 
physical interfaces when there is none or very little traffic 
to send [4], [5]. Finally, the IEEE 802.3az [3] standard was 
sanctioned providing a new low power mode to physical 
Ethernet interfaces that could be used when there was no need 
to send traffic. 

New research then focused on the best way to use this new 
low power mode. The straightforward solution, entering low 
power mode as soon as all traffic has been transmitted, and 
returning to the normal mode with the first packet arrival, 
called/rame transmission, was experimentally studied in [9]. 
A first analytic study appeared in [23] for Poisson traffic, while 
another analysis considering arrivals of packets trains to take 
into account burst traffic arrivals was presented in [12]. 

Another explored possibility to make use of the new power 
mode consists on waiting for the arrival of several packets 
before returning to active mode [6], [14]. This mode, known 
as packet coalescing or burst transmission, avoids unnecessary 
transitions between the normal and low power modes greatly 
improving the energy savings at the cost of additional delays. 
There exists analytic models for the power savings of burst 
transmission for both Poisson traffic [11] and for its delay [24], 
[25]. A general model for general arrival patterns for both 
frame and burst transmission covering both power usage and 
delay can be found in [13]. 

New research tries to find innovative ways to govern the 
use of the low power mode, see for instance [26] that exploits 
traffic self-similarity to obtain the best duration of the low 
power interval, in such a way that maximizes energy savings 
for a given maximum allowable additional delay. 

III. Problem Description 

In transmission networks, it is customary to bundle several 
homogeneous links, i.e., links with similar transmission tech¬ 
nology, as a cheap way for scaling up the aggregate transmis¬ 
sion rate between two endpoints. The bundle can be seen and 
managed either as a set of independent links or as a unit by the 
traffic management algorithms and the upper layer protocols. 
In the latter case, the traffic is split among the individual 
links in the bundle considering the optimization of a given 
performance metric. We focus in this paper on the optimum 
allocation of traffic when the bundle components are Energy 
Efficient Ethernet (EEE) links (IEEE 802.3az [3]), from the 
point of view of total energy consumption minimization. The 
profile of energy consumption in EEE links has been analyzed 
in many works [8], [9], [10], [11], [12], [13], [27], [24], 
and has been shown to be highly sensitive to the statistical 
variability of the incoming traffic. Thus, further gains in energy 


efficiency may be realized if the total traffic load offered to 
the bundle is properly allocated to individual links. 

We consider a bundle comprising N identical transmission 
links. The traffic demand to the bundle is X, and E{xi) is the 
energy consumption of link i = 1,... ,N, where Xi stands for 
the traffic rate in that link. Link capacities are denoted by Q, 
for 2 = 1,..., N.. 

Our goal is to minimize the overall consumption of the 
bundle Eb{xi, ..., xn) = E{xi), that is 

min ... ,XAr) (1) 

such that 


C > Xi>Q, and 

N 

Y,x, = X 

where 

E{xi) = 1 - (1 - croff)(l - Pi) 


TToff(pj) 


TToff(Pi) + Tg -f Tv 


( 2 ) 

(3) 

(4) 


is the normalized energy consumption of link i, as shown 
in [13]. In (4), Tg and are constant and account for, 
respectively, the transition times needed to enter and exit the 
idle mode defined in IEEE 802.3az. Toff(pi) is the average 
time spent by the interface in the idle state for a given input 
load. Note that Toft(pi) depends on both the actual traffic 
arrival pattern and the idle state governor. Einally, (Toff is 
simply the fraction of energy consumed by the interface in the 
idle state compared to its energy consumption in the active 
state and pi = XijC is the normalized traffic load on the 
link. So, (1) is a standard minimization problem amenable 
to analysis provided that E b{xi, . ■. ,xm) is a well-behaved 
function. 


A. Optimum allocation 

In this Subsection, we prove that for certain functions 
Toff(-) the solution to the optimum allocation is a simple 
sequential water-filling algorithm: each link capacity is fully 
used before sending traffic through a new, idle link. Clearly, 
(1) is a concave separable optimization problem when the 
objective function is concave and we have the following simple 
result. 

Proposition 1. If E{-) is a strictly concave function and 
{Cl, (72,..., Cat} with Ci > Ci+i are the link capacities, 
then Eb{xi, ... ,xn) is minimum if Xi = minlCi,^ — 
^j<i 

Proof The proof is a direct consequence of the subadditivity 
of E{-) and is given in Appendix A. □ 

Now we derive sufficient conditions for the concavity of the 
cost function E{-). Recall from (4) that E{-) depends on some 
constants related to the interface hardware and the statistical 
variability of the incoming traffic. We will try to understand 
what conditions must satisfy Toff, which is the only traffic- 
dependent term. Eor clarity and simplicity, in the following 
we use the notations f{p) = Toff(/9) and t{p) = E{p). We 
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will further assume that f{p) is decreasing^ and continuously 
differentiable in p € (0,1). 

Proposition 2. Let f{x) be a function f : [0,1] —)■ M"*", 
decreasing and with continuous derivatives. Let a,b > 0 and 
consider the function 

= (5) 

Under these definitions, t{x) is concave if 

f"{x){f{x) + b)> 2{f{x)f. (6) 

Proof The proof is provided in Appendix B. □ 

Proposition 2 applies trivially to the function E{-) setting 
a = 1 — CToff and 5 = Tg + Tw, so we have derived a 
simple sufficient condition for the Toff (•) term that makes Ef) 
concave and the optimization problem easily solvable. 

IV. Analysis of Frame and Burst Transmission 

In this Section we check whether the known formulas for 
the average sleeping time in FEE satisfy the condition of 
Proposition 2. According to [13] the time Toff(-) depends 
both on the incoming traffic characteristics and the threshold 
algorithm used to switch between the idle and the active states 
in the Ethernet interface. There are two main approaches, the 
frame transmission algorithm and the burst transmission one, 
that we consider next. 

A. Frame Transmission 

Frame transmission is a straightforward use of the idle 
mode. Under frame transmission, the physical interface is put 
in idle mode as soon as the last frame in the queue has been 
transmitted, and normal operation is restored as soon as new 
traffic arrives at the networking interface. For many common 
traffic patterns this operating mode does not produce great 
energy savings, as there is a transition period every time the 
interface changes its operating mode that draws some energy. 
From [13], for the frame transmission algorithm 

pOC) 

T^:r{p)=J^ (f-Tg)/p,T,(t)df, (7) 

where /p,Te(0 denotes the probability density function for 
traffic load p of the empty period, i.e., the time elapsed 
since the queue empties until the subsequent first arrival. 
When fppr^it) is unknown, from [13] equation (7) can be 
approximated by 

with p,~^ the average packet transmission duration. Closed 
formulas exist when the arrival process follows a Poisson or 
a deterministic distribution. In particular, for Poisson arrivals, 
we have 

g-AipTs 

T^'r(p) =-• (9) 

p,p 


1) Poisson traffic: For proving the concavity under the 
assumption of Poisson arrivals, we start by noting that f{p) = 
/{pp) and substitute this in (6) with b = Tg + Tw. The 
result is the condition 


T^e-A^P-W 
— -+ 





+ Tg + T 


> -2 


g-PPTs 


2Tge-'''’T= \ 

) 

Tge-PPT= \ 2 

P ) 

( 10 ) 


and after some routine simplifications this reduces to 

(Tg + Tw)e'^^T=(2 + ppTg(2 + /ipTg)) > Tg(2 + AipTg). (11) 

But ppTs > 0 and > 1, so 

(Tg + (2 + ppTg(2 + ppT,)) > 

Tge'''>T.(2 + + /rpTg)) > Tg(2 + ppT,), 


and (6) is satisfied. 

Note that it is important to ascertain that the link con¬ 
sumption function E{-) is concave for Poisson traffic since, 
notwithstanding that Poissonian models are not generally 
suitable, they are reasonably valid for real traffic in sub-second 
timescales [28] and also for aggregated traffic in the Internet 
core [29]. In any case, in Section VI we test the validity of our 
assumptions with both synthetic and real traffic traces collected 
in Internet links. 

Figure 1 shows, for purposes of illustration, a contour plot of 


r frame 


(p) 


-. The traffic 


h'fp) for the function hip) = 
is Poissonian and the Ethernet link runs at 10 Gb/s (& = Tg -f 
Tw = 2.28 ps + 4.48 ps, as mandated by the IEEE 802.3az 
standard [3]), with packet sizes between 64 and 9000 bytes.^ 
It can be seen that h''{p) > 0 in the region of interest, thus 
hip) is convex and Eix) = 1 — (1 — CToff)(l — p)hip), 0 < 
p, CToff < 1 is concave. 

2 ) General traffic distributions: For unknown traffic distri¬ 
butions we must resort to the approximation given by (8), so 
we let fip) = l/ipp) — Tg and 6 = Tg -f T^- Now we can 
immediately substitute in f''ip)ifip) + b) > 2(/'(p))2 and 
get 

2 


2(ij^+Tw) 

pp^ 


> 


p2p4- 


After some straightforward cancellations, this is 

2T,, 


pp-^ 


> 0 , 


(13) 


(14) 


which is obviously true. 


B. Burst Transmission 

Burst transmission is a simple modification of frame trans¬ 
mission that waits until a given number of packets Qw arrive 
at the network interface before exiting idle mode. To avoid 
excessive delays, there is a tunable parameter T^ax that limits 
the wait for the Q^-th frame since the first frame arrives. The 
analysis of the burst transmission algorithm is more involved. 


*/(p) computes the average time spent by the interface in the idle state, 
so it is reasonable to assume it is decreasing when the traffic load is higher. 


^Although the maximum Ethernet capacity is limited to 1500 bytes, we 
have tested greater packet sizes to account for so called Jumbo-frames. 
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Figure 1. Contour plot of h''{p) when h{p) = ^fi.ame(°p)^T,+T„ ^ 

Poisson arrival process under the frame transmission energy-saving algorithm. 


for the reason that there is not one but two operating regimes 
depending on the traffic load. Fortunately, [13] shows that the 
two operating regimes (low and high traffic load, respectively) 
can be neatly separated by the approximate traffic threshold 


Qw-1 
mT max 


(15) 


where Qw and T^ax are the tunable parameters in the burst 
transmission algorithm [11]. As in the previous Section, we 
will proceed and check whether, with burst transmission, the 
link energy consumption function is concave. 

1) Low load regime, p < p*: When the traffic load is low, 
the interface exits the low power mode before a backlog of 
Qw packets accumulates at the queue due to the timer expiry 
after waiting for Tmax seconds. The exact expression for the 
expected sojourn time in the low-power state is (see [13]) 


rpburst, low 
-•-off 


ip) = r 

Jo 


(f + Tmax - Ts)/p,Te(0df- (16) 


size Qw before the timer expires. Now, the expected sojourn 
time in the low-power state is given by 

pOC) 

(f-TQ/p,Q„(f)df, (18) 

where, as usual, /p,Q„ (f) is the probability density function of 
the Qw-th frame arrival epoch after the interface has entered 
the idle mode. When the density is unknown, according to [13] 
the expected time can be well approximated by 

^tarst,Mgh(^) ^ Qw _ 

pp 

whereas the exact formula for the case of Poissonian arrivals 
is 


rpburst,high 
-^off 


(p) 


r(Qw + 1, MpTs) - ppTsP{Q^, PpTs) 
/tpr(Qw) 


( 20 ) 


Here, r( ) and r(-, •) are the complete and incomplete Gamma 
functions [30], respectively. 

In order to prove that Poissonian arrivals lead to concave 
energy consumption functions, simply substitute (20) into (6) 
to obtain after some straightforward calculations the inequality 


Mpr(Qw)'e'^^T=(T4AipTs)Q-((Qw - AipTQr(Qw,T,)+ 

mp(Ts + Tw)r(Qw))+ 

2e''^T^r(Qw + 1, ppTs)iiT, + Tw)r(Qw)- 

TsT{Qvj,PpTs))j > 0. 

( 21 ) 


All the constant terms appearing in the above inequality are 
positive, so this simplifies somewhat to 

Ts(mpTs)‘^"' (^ppHTs + Tw)r(Qw) - Tsr(Qw, PpTs)) + 

r(Qw +1, ppT^s)^ + 

r(Qw + i,Tspp){{Ts + Tw)r(Qw) - Tsr(Qw,rspp)) > o 

( 22 ) 


which holds true because 


When /p,Te(i) is unknown, according to [13], (16) can be 
approximated by 

Tor ‘™(P) -—+ Tmax - Ts. (17) 

pp 

As in the frame transmission algorithm, there exist closed 
expressions for for some distributions, and re¬ 

markably (17) is exact with Poissonian arrivals. 

Proving the concavity of E{p) in this case is direct. First, 
note that f{p) = + T„,ax, so that the 

derivatives /' and /" are the same as in the frame transmission 
case, and hence plugging (17) into the condition f"{x){f{x) + 
b) > 2{f'{x))‘^ one can easily check that the inequality holds. 

2) High load regime, p > p*: When the traffic load is high, 
the packet burst is much more likely to reach its maximum 


MP(Ts + Tw)r(Qw) > ppTsP{Qvj) > MpTsr(Qw,AipTs) 

(23) 

as a consequence of elementary properties of the Gamma 
functions. This implies that all the summands in the left side 
of (22) are positive, and (6) is satisfied. 

The last step is to prove concavity for the general approxi¬ 
mation (19). A change of variable m = Qw/p transforms (19) 
into (8) formally. Since m > 0, following the same steps as 
in frame transmission, one concludes that (19) also fulfills 
condition (6). Hence, the link energy consumption function 
E{-) is concave with burst transmission in the high-load 
regime, regardless the traffic arrival pattern. 

A numerical illustration of the concavity is shown in Fig. 2, 
which depicts the contour plots of h"{p) for a 10 Gb/s Ethernet 
link as the traffic load and the packet size vary. 
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for a Poisson arrival process under the burst transmission energy-saving algorithm. 


V. Delay Control i 

2 

According to the previous sections, a straightforward appli-^ 
cation of a water-filling algorithm to share traffic among thes 
bundle links provides maximum energy savings. However, if6 
proper care is not taken, packet delay can grow uncontrolled^ 
as we explain next. 9 

From a practical point of view there are many ways to 
implement a water filling algorithm. For instance, one could 
use separate queues for each link and only divert traffic to new 
links when the queue of the previous one overflows. Obviously, 
this approach exhibits the greatest delay. A second option is to 
limit the load factor in every link, and thus the delay, and divert 
traffic when this threshold is reached. Its main drawback is that 
no link is used at its full capacity and so the energy savings 
are not maximum. Another option, in the opposite extreme, is 
to have a common bundle queue and zero-length queues at the 
links. In this case, a new link is used if when a packet arrives, 
the previous link is busy transmitting a packet. The problem 
is that if the traffic load is not high enough, we will find that 
the first link is idle while the second one is transmitting, and 
that goes against the idea of the water-filling algorithm. 

We propose a simple dynamic water-filling algorithm that 
can control average delay, while keeping the utilization factor 
of the links close to 1. The algorithm has one configuration 
parameter, the expected delay {de) and V -f 1 state variables, 
with N the number of links in the bundle, as it just keeps a 
record of the short term average delay {dav), calculated with an 
exponentially weighted moving average, and the current queue 
length in each link measured in time units (qi, i = 1... N). 
More precisely, when a new packet is about to get queued in 
queue i, the current average delay value is updated as 

dav = + (1 - P)dav, 0 < /3 < 1 , ( 24 ) 


function packet_arrival 
if {dav de) 

return enqueue(Links[1]) 

foreach {£ in Links) 
if {qi < de) 

return enqueue {£) 

return enqueue(N) 

Listing 1. Dynamic water-filling algorithm. 

where qf is the qi value when the j-th packet arrives, calcu¬ 
lated as the amount of traffic stored in the i-th queue over the 
link capacity, and /3 is a gain factor. Updating dav on packet 
arrivals avoids the need to record and store the arrival time of 
every packet to the system. 

The algorithm works as follows. Each link in the bundle 
is assumed to have its own queue, so whenever a new packet 
arrives, the algorithm decides which queue should store it. For 
this the expected delay is compared with the current average 
delay. If dav < de, the packet is stored in the queue of the first 
link. For every other case, a sequential search is started for a 
queue with a queue length smaller than the expected delay. If 
no queue is found, the packet is stored in the last queue. This 
is all summarized in Listing 1. 

VI. Results 

We have carried out several experiments to assess the 
effectiveness of our proposed sharing strategy. We have em¬ 
ployed the ns-2 network simulator with an added module for 
simulating IEEE 802.3az links available for download at [31]. 
The simulated bundles have a varying number of 10 Gb/s links 
with lOGBASE-T interfaces, so Tg = 2.88 /rs, = 4.48 /is 
and (Toff = 0.1, in accordance with several estimates provided 
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by different manufactures during the standardization process 
of the IEEE 802.3az standard. Eor the burst transmission 
simulations we set up Tmax = 100 /is and Qw = 20 frames, 
so that fiTs > 3.6frames, as recommended in [13]. 

A. Model Validation 

The first set of experiments tests all possible traffic sharing 
alternatives in a simple 2-link bundle when it is fed with 
synthetic traffic. Eor the experiments we used a fixed frame 
size of 1000 bytes and a varying arrival rate, so that the 
aggregated load ranged between 25 and 175%. Then, for each 
load we modified the share between the two links and, for each 
share, we run five simulations with different random seeds and 
a ten seconds duration. 

Eigure 3 shows the total energy consumption of the bundle 
versus the traffic load on the second link for Poisson traffic 
with both the frame and burst transmission algorithms. Eor 
clarity, we take advantage of the symmetry of the problem 
and only represent the results where load on the second link 
is smaller than that on the first. Thus, for each experiment 
the leftmost value represents the water-fill algorithm, with 
most of the traffic on the first link, while the rightmost value 
corresponds with an equal share of traffic among both links. 
Eigure 3 shows very clearly that there is very little variance 
among the different simulations for the same share and load 
and, at the same time, that the results match those provided 
by the model, plotted with continuous lines in the graph. It is 
also easy to see the increasing energy consumption with the 
traffic load on the second link. The closer the loads of both 
links are, the higher the energy needs. In fact, the minimum 
consumption is obtained when most load is concentrated on 
a single link, as predicted. Einally, we also observe that the 
benefit of aggregating load on a single link is much greater for 
frame than for burst transmission. This is a consequence of the 
fact that the energy profile of the burst transmission algorithm 
is more linear [7]. Thus, there is less sensitivity to how the 
traffic load is shared among the links of the bundle —note 
that total energy consumption shows little variations when the 
traffic share is modified in Eig. 3(b)—. Also, as expected, burst 
transmission needs less energy than frame transmission. 

The results for Pareto traffic (with the shape factor a set to 
2.5)^ are plotted in Pig. 4. Although the performance curves 
are not as smooth as for the Poisson traffic, the previous 
conclusions still hold. Again, the minimum consumption is 
obtained when most of the traffic is on a single link and then 
increases as the traffic on the second link increases. At the 
same time, the frame transmission algorithm benefits more 
than the burst transmission one. 

Our second experiment compares the overall energy con¬ 
sumption of an Ethernet bundle for the full range of possible 
incoming traffic demands and two different sharing methods. 
The first spreads the traffic evenly across all the constituent 
links, denoted in the results by equitable, while the second 

^Pareto distributions must be characterized with a shape parameter a greater 
than 2 to have a finite variance. However, the greater the a parameter is, the 
shorter the fluctuations, so a value of 2.5 is a good compromise to have finite 
variance along with significant fluctuations. 


is the naive water-filling method. Traffic follows a Poisson 
distribution and the frame size is 1000 bytes, as in the pre¬ 
vious experiment. Eigure 5 displays both the experimental 
and analytic results for two, four and eight-link aggregates. 
Again, frame transmission algorithm benefits more than burst 
transmission of the water-fill sharing algorithm. Purther, as the 
number of links in the bundle increases, the energy demands 
of frame transmission, when using the water-fill procedure, 
approximate those of burst transmission. 

B. Dynamic Water-filling Algorithm 

The next set of experiments tests the behavior of the 
dynamic water-filling algorithm. We have employed real traffic 
traces captured on Internet backbones for the simulations. The 
traffic comes from the publicly available passive monitoring 
CAIDA dataset from 2013 [32] which provides anonymized 
traces from a 10 Gb/s Internet backbone. We used one of these 
traces to feed traffic to a simulated 4-link bundle made of 
lOGBASE-T interfaces. Of all the available traces, we have 
chosen one with a relatively high demand of about 6 Gb/s on 
average. As that load is still quite low for our simulated bundle 
of 40 Gb/s we made new traces of approximately 12, 18, 24 
and 30 Gb/s combining traffic from additional independent 
adjacent traces. Eor this we concatenated the traces and then 
reduced the inter-arrival times by a constant factor (2, 3, 4 
and 5 respectively). We proceeded in this manner to keep any 
existing auto-correlation in the final traces. Einally, we have 
chosen /? = 0.1 as the gain factor in (24). 

The first experiment verifies that the proposed dynamic 
algorithm is in fact able to control the average delay. Eor 
this we have fed all the traffic traces to a 4-link bundle, 
and configured the algorithm for different expected delays. 
The results are plotted in Pig. 6.“* It can be clearly seen an 
almost perfect relationship between the configured and the 
measured average delay for values greater than the transition 
times of the PEE links. The exception is the 6 Gb/s trace, 
that is bounded below 4 /rs. This is expected, as the queue 
cannot grow larger when the capacity of a single link is greater 
than the offered traffic. The simulation with the 12 Gb/s trace 
shows a small drift of the average delay, but, in any case, the 
average delay is kept below the configured delay. This error in 
the 12 Gb/s experiment occurs because (24) can overestimate 
average queuing delay if waiting time samples from low used 
queues are few and far between them, so the samples from the 
first queue get over-represented. Although omitted for brevity, 
decreasing the /3 value lessens the drift. 

The second experiment shows the variation of power con¬ 
sumption versus expected delay. The results are shown in 
Pig. 7. When the expected delay is too low, all links are used 
simultaneously, and the power savings are minimal. However, 
as the allowed delay increases, most of the traffic is transmitted 
by the first links and, despite the fact that all of them are 
powered on, we achieve large power savings thanks to the 
concavity of the cost function. It is important to notice that the 
maximum energy savings are already obtained starting from 

■^Results for burst transmission have been omitted for the sake of brevity, 
but show a similar behavior. 
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(a) Frame transmission (b) Burst transmission 


Figure 3. Results for a 2-link bundle with Poisson traffic as a function of excess traffic load on the second link. 
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Figure 4. Results for a 2-link bundle with Pareto traffic as a function of excess traffic load on the second link. Theoretical values omitted, as there is no 
closed form formula for Pareto arrivals [13]. 


low delay target values. This allows to deploy the algorithm 
even in networks used by delay-sensitive applications. 

In the last experiment we have compared the results ob¬ 
tained when sharing the traffic with three different strategies; 
spreading the traffic evenly across the four links, that we called 
equitable, a naive implementation of the water-fill algorithm 
and, finally, the dynamic water-fill algorithm with a target de¬ 
lay of ten microseconds for the frame transmission algorithm 
and 20 ps for the burst one.^ For the naive implementation we 
have constrained the traffic load on any link to 90% to avoid 
excessive buffering. 

The exact traffic rate of each trace and the different shares 
are detailed in Table I. For the equitable and the naive 
water-fill they have been determined beforehand, but for the 
dynamic algorithm the table lists the results obtained via 
simulation. The results for both the frame transmission and the 
burst transmission algorithms are depicted in Fig. 8. In every 
case the frame transmission algorithm needs more energy 
than the burst transmission one, but, at the same time, the 
savings resulting from applying the water-fill procedure are 


Bundle Strategy Link #1 Link #2 Link #3 Link #4 

Equit. 1.55 1.55 1.55 1.55 

6.21 Naive Water-fill 6.21 0 0 0 

Dyn. Frame 6.21 0 0 0 

Dyn. Burst 6.18 0.03 0 0 

EquE 3TT5 3T5 3T5 3.15 

12.60 Naive Water-fill 9 3.60 0 0 

Dyn. Frame 9.44 3.08 0.07 0 

Dyn. Burst 9.50 2.67 0.39 0.04 

Equlfi 4771 liTO 47m 4.70 

18.81 Naive Water-fill 9 9 0.81 0 

Dyn. Frame 9.94 7.12 1.73 0.02 

Dyn. Burst 9.97 6.82 1.77 0.25 

EquE 07 07 07 6.27 

25.08 Naive Water-fill 9 9 7.08 0 

Dyn. Frame 10 9.16 5.12 0.80 

Dyn. Burst 10 9.13 4.78 1.17 

EquE 7585 TiSS TiSS TiSO 

31.40 Naive Water-fill 9 9 9 4.4 

Dyn. Frame 10 9.82 7.94 3.64 

Dyn. Burst 10 9.83 7.74 3.83 

TSHFT 

Average traffic fed into each link for the real traffic 
SIMULATIONS (IN Gb/S). 


^In burst transmission power savings reach their maximum for a higher 
delay value than frame transmission. This is expected as burst transmission 
adds additional delay in the form of queuing before waking up a link. 














(a) Two links bundle 



(b) Four links bundle 



(c) Eight links bundle 


Figure 5. Normalized global consumption of a bundle link for the different 
idle mode governors. 

also greater. In fact, there is usually very little difference in 
the consumption of both EEE algorithms in that case. As 
expected, the equitable share draws more energy than the other 
two shares and the water-hll share is the one that produces the 
best results. Einally, the dynamic water-hll algorithm improves 
the results, but not substantially. 

We have also measured the impact of the different algo¬ 
rithms on queuing delay. Eigure 9 shows the average queuing 
delay suffered by the traffic in the previous experiment. As it 
is the case for single EEE links [13], we observe that burst 
transmission always causes more delay than frame transmis- 



Figure 6. Obtained average delay versus configured delay for the dynamic 
water-fill algorithm. 



Figure 7. Power consumption versus expected delay for the dynamic water-fill 
algorithm. 



Figure 8. Energy consumption with real traffic traces when employing 
different strategies to share the traffic in a 4-link bundle. 
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Figure 9. Average queuing delay with real traffic traces when employing 
different strategies to share the traffic in a 4-link bundle. 

sion. We also find that the different sharing methods impact on 
the queuing delay differently. In the 6 Gb/s case, the equitable 
share produces the highest delays with burst transmission, as 
there is relatively little traffic in every link, and thus packets 
have to wait for packets to arrive before being transmitted. 
In every other case, the naive water-hll algorithm produces the 
longest packet delays, as at least one queue is driven near its 
full capacity. Finally, the dynamic algorithm produces stable 
delays, near its target —10/xs for frame and 20 ^s for burst 
transmission— that are in the same range as those of the 
equitable share. 

VII. Conclusions 

This paper presents an optimum, yet simple, procedure 
for distributing traffic load among the links of a bundle that 
minimizes energy consumption when individual links employ 
an FEE algorithm. As explained, the maximum energy savings 
are obtained when traffic is only transmitted on a link if all the 
previous ones in the aggregate are already being used at their 
maximum allowed load. The paper proves the optimality of 
the procedure for typical energy cost functions of individual 
Ethernet links. 

The provided procedure is oblivious of the energy saving 
algorithm used in the links, whether it is the simple frame 
transmission algorithm or the more efficient burst transmission 
one. Moreover, we found that as the number of links forming 
the bundle increases, the difference in the total energy con¬ 
sumption between both algorithms vanishes when using our 
sharing procedure. Thus, for bundles made up of many links 
it is advisable to use the simpler/rame transmission algorithm 
in the links, as it both reduces complexity and adds less latency 
to the transmitted frames. 

We have also explored several alternatives to build a prac¬ 
tical implementation of the water-hlling idea to then present 
a simple practical implementation that is able to keep average 
delay controlled at a configurable target value while mini¬ 
mizing overall energy consumption. The algorithm requires 
little memory and computational power, so that a vendor can 
implement it just by modifying the firmware of the Ethernet 
line card. However, as the algorithm needs to obtain the queue 


occupation of each port to classify incoming packets, an open- 
flow implementation is not currently possible, as the current 
spec [33] does not dehne the needed counters. Future research 
could explore the possibility of extending the current spec 
to empower the user with the hne grained control of the 
transmission ports needed by our proposal. 

Finally, we have tested our procedure with both synthetic 
and real traffic traces. In all cases, the obtained results match 
our expectations with the best results being obtained when 
the proposed sharing algorithm is employed, reducing energy 
consumption as much as 50%. 

Appendix A 
Proof of proposition 1 

In this Section, we prove that for the particular case of 
equal cost functions the solution to the optimum allocation is 
a simple sequential water-hlling algorithm; each link capacity 
is fully used before sending traffic through a new, idle link. 

We assume X < C'z, otherwise the solution is trivial. It 
is easy to see that the constraints dehne a convex region TZ. 
Since the objective function is concave, it follows that it attains 
its minimum at some of the extreme points of TZ, namely 
Xi = Ci for i G T C [1 : N], 0 < Xj < Cj for one 

j € [1 : N] and Xk = 0 for all k G \ {j}- In fact, when all 
the cost functions are equal, the optimal traffic allocation is to 
use the links in decreasing order of capacity. Assume, without 
loss of generality, that Ci > C 2 > ■ ■ ■ > Cm-^ Fix two links i 
and j, i > j, and assume that a feasible solution is the vector 
X = (x*,..., x’ff). Then, since is a concave function it is 
also subadditive, and for i > j and 6 < min{x*, Cj — x*} we 
have 

E{x*) + E{x*) > E{x* -6) + E{x* + 6). (25) 

Therefore, the vector x = (x*,...,x* -|- 6,... ,x* — 
5,, x*pf) is a better solution than x. Iterating this argument 
as many times as necessary, it is immediate to conclude that 

X* = Ci for i = 1,..., s — 1 (26) 

s-l 

0<x*=X-'^Ci<Cs (27) 

i=l 

X* = 0 for J = s -h 1,..., (28) 

is the optimal solution, where X)i=i C* 

To see (25), recall that for a concave function / and three 
ordered points a < 5 < c it holds 

fjb) - f{a) > /(c) - /(g) 
b — a ~ c — a 

Just let t = {b — a)/{c — a) so that 6 = (1 — t)a + tc. By the 
dehnition of concavity 

f{b) > {l-t)f{a)+ tf{c), (30) 

which is (29). Similarly, for a < b < c 

/(c) - /(g) ^ /(c) - /(b) 
c — a ~ c — b ’ 

^If some links are of the same capacity, each permutation of the links lead 
to an equivalent solution of the problem. 
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and combining (29) and (31) gives 

f{b) - f{a) ^ f{c) - f{b) 
b — a ~ c — b 

Now, use inequality (32) twice over the tuples {xi 
and {xi,Xj,Xj + S) to conclude (25). 


(32) 

Sj Xi , Xj ) 
□ 


Appendix B 

Proof of proposition 2 

Consider the auxiliary function 

u{x) = 1 - t{x) = a(l - x) , = g{x)h{x), (33) 

f{x) + b 

where g{x) = a(l — x) and h{x) = f{x)/{f{x) + b). Strict 
concavity of t{x) is equivalent to u{x) being strictly convex 
or, alternatively, to u"{x) > 0. Taking the second derivative 
of u{x) we get 

u''{x) = g{x)h''{x) — 2ah'{x), (34) 


because g'{x) = —a. So, u(x) is strictly convex if and only 
if g(x)h"(x) > 2ah'(x). But 


h'(x) = b 


fi^) 


(fix) + by 


< 0 , 


(35) 


since we assumed f{x) to be decreasing. With g{x) > 0 for 
X G [0,1], a > 0 and (35), (34) shows that h(x) convex implies 
u(x) convex. Finally, 


r(x)(f(x) + b)-2(f(x)f 

(f(x)+bf 


(36) 


and h(x) is a convex function —t(x) is a concave function— 
if and only if f"(x){f{x) + b) > 2(/'(a;))^, since f{x) is 
nonnegative. □ 
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